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^ ' Abstract 

' We present a generalization of the Viterbi algorithm for identifying the path with minimal 

(resp. maximal) weight in a n-tape weighted finite-state machine (n-WFSM), that accepts a given 
n-tuple of input strings (si, . . . s„). It also allows us to compile the best transduction of a given 
input n-tuple by a weighted (n+m)-WFSM (transducer) with n input and m output tapes. Our 
algorithm has a worst-case time complexity of O ( |s| n |_E| log |s| n |Q| ), where n and |s| are the 
number and average length of the strings in the n-tuple, and \Q\ and \E\ the number of states and 
transitions in the n-WFSM, respectively. A straight forward alternative, consisting in intersection 
followed by classical shortest-distance search, operates in O ( |s| n (|_E| + |Q|) log |s| n |Q| ) time. 
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1 Introduction 

The topic of this paper is situated in the areas of multi-tape or n-tape weighted finite-state machines 
(n-WFSMs) and shortest-path problems. 

n-WFSMs (Rabin and Scott, 1959; Elgot and Mezei, 1965; Kay, 1987; Harju and Karhumaki, 1991; 
Kaplan and Kay, 1994) are a natural generalization of the familiar finite-state acceptors (one tape) and 
. transducers (two tapes). The n-ary relation defined by an n-WFSM is a weighted rational relation. 

Finite relations are of particular interest since they can be viewed as relational databases. A finite- 
state transducer (n = 2) can be seen as a database of string pairs, such as {spelling, pronunciation) or 
(French word, English word). Unlike a classical database, a transducer may even define infinitely many 
pairs. For example, it may characterize the pattern of the spelling-pronunciation relationship in such 
a way that it can map even the spelling of an unknown word to zero or more possible pronunciations 
(with various weights), and vice- versa. n-WFSMs have been used in the morphological analysis of 
Semitic languages, to synchronize the vowels, consonants, and templatic pattern into a surface form 
(Kay, 1987; Kiraz, 2000). 

Classical shortest-path algorithms can be separated into two groups, addressing either single- 
source shortest-path (SSSP) problems, such as Dijkstra's algorithm (Dijsktra, 1959) or Bellman-Ford's 
(Bellman, 1958; Ford and Fulkerson, 1956), or all-pairs shortest-path (APSP) problems, such as Floyd- 
Warshall's (Floyd, 1962; Warshall, 1962). SSSP algorithms determine a minimum-weight path from 
a source vertex of a real- or integer-weighted graph to all its other vertices. APSP algorithms find 
shortest paths between all pairs of vertices. For details of shortest-path problems in graphs see (Pettie, 
2003), and in semiring- weighted finite-state automata see (Mohri, 2002). 

We address the following problem: in a given n-WFSM we want to identify the path with minimal 
(resp. maximal) weight that accepts a given n-tuple of input strings (s\, . . . s n ). This is of particular 
interest because it allows us also to compile the best transduction of a given input n-tuple by a 
weighted (n+m)- WFSM (transducer) with n input and m output tapes. For this, we identify the best 
path accepting the input n-tuple on its input tapes, and take the label of the path's output tapes as 
best output m-tuplc. 
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A known straight forward method for solving our problem is to intersect the n-WFSM with another 
one that contains a single path labeled with the input n-tuple, and then to apply a classical SSSP 
algorithm, ignoring the labels. We show that such an intersection together with Dijkstra's algorithm 
have a worst-case time complexity ofC?(|s| n (|£ : | + |Q|) log |s|"|Q ), where n and \s\ are the number 
and average length of the strings in the n-tuple, and \Q\ and \E\ the number of states and transitions 
of the n-WFSM, respectively. 

We propose an alternative approach with lower complexity. It is based on the Viterbi algorithm 
which is generally used for detecting the most likely path in a Hidden Markov Model (HMM) for 
an observed sequence of symbols emitted by the HMM (Viterbi, 1967; Rabiner, 1990; Manning and 
Schutze, 1999). Our algorithm is a generalization of Viterbi's algorithm such that it deals with an 
n-tuple of input strings rather than with a single input string. In the worst case, it operates in 
O {\s\ n \E\ log |s| n |Q|) time. 

This paper is structured as follows. Basic definitions of weighted n-ary relations, n-WFSMs, 
HMMs, and the Viterbi algorithm are recalled in Section 2. Section 3 adapts the Viterbi algorithm to 
the search of the best path in a 1-WFSM that accepts a given input string, and Section 4 generalizes it 
to the search of the best path in an n-WFSM that accepts an n-tuple of strings. Section 5 illustrates 
our algorithm on a practical example, the alignment of word pairs (i.e., n = 2), and provides test 
results that show a slightly higher than C(|s| 2 ) time complexity. The above mentioned classical 
method for solving our problem is discussed in Section 6. Section 7 concludes the paper. 

2 Preliminaries 

We recall some definitions about n-ary weighted relations and their machines, following the usual 
definitions for multi-tape automata (Elgot and Mezei, 1965; Eilcnberg, 1974), with semiring weights 
added just as for acceptors and transducers (Kuich and Salomaa, 1986; Mohri, Pereira, and Riley, 
1998). For more details see (Kempe, Champarnaud, and Eisner, 2004). We also briefly recall Hidden 
Markov Models and the Viterbi algorithm, and point the reader to (Viterbi, 1967; Rabiner, 1990; 
Manning and Schutze, 1999) for further details. 

2.1 Weighted n-ary relations 

A weighted n-ary relation is a function from (S*) n to IK, for a given finite alphabet £ and a given 
weight semiring JC = (K, ©, ®, 0, 1). A relation assigns a weight to any n-tuple of strings. A weight of 
can be interpreted as meaning that the tuple is not in the relation. We are especially interested in 
rational (or regular) n-ary relations, i.e. relations that can be encoded by n-tape weighted finite-state 
machines, that we now define. 

We adopt the convention that variable names referring to n-tuples of strings include a superscript 
Thus we write rather than s for a tuple of strings (si, . . . s n ). We also use this convention for 
the names of objects that contain n-tuples of strings, such as n-tape machines and their transitions 
and paths. 

2.2 Multi-tape weighted finite-state machines 

An n-tape weighted finite-state machine (WFSM or n-WFSM) is defined by a six-tuple = 
(£,Q,/C,£(™\ X,g), with £ being a finite alphabet, Q a finite set of states, /C = (K, ©, ®,0, 1) the 
semiring of weights, C (Q x (£*)" xKxQja finite set of weighted n-tape transitions, A : Q — > K 
a function that assigns initial weights to states, and g:Q^Ka function that assigns final weights 
to states. 

Any transition G E^ has the form e^ = (y , £( n \ w , t) . We refer to these four components 
as the transition's source state y(e^) G Q, its label £(e (n) ) G (£*)", its weight w{e^) G K, and its 
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target state t(e^) G Q. We refer by E(q) to the set of out-going transitions of a state qeQ (with 
E(q)C £(")). 

A pat/i 7^ of length k > is a sequence of transitions e^e^ • • • ej^ such that t(e\ n ^) =y(e^ 1 ) 
for all i G [1, fc— 1]. The label of a path is the element- wise concatenation of the labels of its transitions. 
The weight of a path 7^ is 

w(j (n) ) =def Ky(e[ n) )) ® f (g) «> (ej n) ) I (8) <?(t(e£ n) )) (1) 

\j'€[l.fc] / 

The path is said to be successful, and to accept its label, if w(^ n >) 7^ 0. 

2.3 Hidden Markov Models 

A Hidden Markov Model (HMM) is defined by a five-tuple (£, Q, A, B), where £ = {0} is the output 
alphabet, Q = {%} a finite set of states, n = {m} a vector of initial state probabilities 7Tj = p(x\ = qi) : 
Q — ► [0, 1] , A = {aij} a matrix of state transition probabilities ay = p(x* = <7j x*_i = (ft) : QxQ — > 
[0,1] , and B^jbjfe} a matrix of state emission probabilities bj^ = p(o t =crk\xt=qj) ■ QxE — > [0,1] . A 
paf/i of length T in an HMM is a non-observable (i.e., hidden) state sequence X = x\ ■ ■ ■ xt, emitting 
an observable output sequence O — o± ■ ■ ■ ot which is a probabilistic function of X. 

2.4 Viterbi Algorithm 

The Viterbi algorithm finds the most likely path X = argmax x p(A|0, fi) for an observed output 
sequence O and given model parameters /1 = A, B), using a trellis similar to that in Figure 1. It 
has a 0(T \Q\ 2 ) time and a 0(T \Q\) space complexity. 



3 1-Tape Best-Path Search 

The Viterbi algorithm (Viterbi, 1967; Rabiner, 1990; Manning and Schutze, 1999) can be easily 
adapted for searching for the best of all paths of a 1-WFSM, that accept a given input string. 

We use a notation that will facilitate the subsequent generalization of the algorithm to n-tape best- 
path search (Section 4). Only the search for the path with minimal weight is explained. An adaptation 
to maximal weight search is trivial. 




initial final 
Figure 1: Modified trellis for 1-tape best-path search 



3 



3.1 Structures 

We use a reading pointer p £ P = {0, . . . \ s\} that is initially positioned before the first letter of the 
input string s, p = 0, and then increased with the reading of s until it reaches the position after the 
last letter, p — \s\. At any moment, p equals the length of the prefix of s that has already been read. 

As it is usual for the Viterbi algorithm, we use a trellis <t>= Q x P, consisting of nodes ip = (q,p) 
which express that a state q E Q is reached after reading p letters of s (Figure 1). We divide the 
trellis into several node sets <t> p = {<p=(q,p)} Q <t>, each corresponding to a pointer position p or to 
a column of the trellis. For each node ip, we maintain three variables referring to ip's best prefix: w v 
being its weight, ip v its last node (immediately preceding tp), and e v its last transition eeE of 
The Vv are back-pointers that fully define the best prefix of each node ip. All w v , ip v , and e v are 
initially undefined ( = _L )} 



FsaViterbi(s, j 4 (1) )^7: [ 7 = ei • • • e r ] 

1 ^initial <- [ ^initial = *0 ] 

2 for \/q e Q : A(g)/0 do 

3 ip <— {q, 0) ; Wip < X(q) ; <t>initiai <— ^initial U {if} 

4 * «- {<J> initial } 

5 for p = 0, . . . \s\ — 1 do 

6 for \/<p={q,p) £ <t> p do 

7 for Ve G £(qr) do 

8 if 3v,v eE* : u£(e)v = s A p= |u| 

9 then p' <- p + \£(e)\ 

10 <- \t(e),p') ; u/ <- «v ® w(e) 

11 *<-*U{<ty} 

12 <D P / «- <D P / U iy } 

13 if w v / =1 V w v i > w' 

14 then w v i <— to' ; < — tf9 ; <— e 

15 £ <- argmin^ (q :p>e(|)f|na| («ve(<?)) [ *finai = *| s | ] 

16 7 <— getPath(ip) 

17 return 7 



Figure 2: Pseudocode of 1-tape best-path search 



3.2 Algorithm 

The algorithm FsaViterbi( ) returns from all paths 7 of the 1-WFSM that accept the string 
s, the one with minimal weight (Figure 2). A^ 1 ) must not contain any transitions labeled with e (the 
empty string). At least a partial order must be defined on the semiring of weights. Nothing else is 
required concerning the labels, weights, or structure of A^. 2 

The algorithm starts with creating an initial node set 4>i n itiai = for the initial position p = 
of the reading pointer. The set ^initial contains a node for each initial state of A^ (Lines 1-3). The 
prefix weights w v of these nodes are set to the initial weight X(q) of the respective states q. The set 
of node sets <fr contains only 4>i n itiai at this point (Line 4). 

1 The variables w v , and e v can be formally regarded as elements of the vectors w, xp, and e, respectively, that 
are indexed by values of ip. In a practical implementation is, however, meaningful to store these variables directly on 
the node that they refer to. 

2 Cycles are, e.g., not required to have non-negative weights (as for Dijkstra's algorithm) because all paths of interest 
are constrained by the input string. 
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In the subsequent iteration (Lines 5-14), reaching from the first to the one but last pointer position, 
p = 0, . . . \s\— 1, we inspect all outgoing transitions e E E{q) of all states qEQ for which there is a node 
p = (q,p) in <t> p . If the label £(e) of e matches s at position p, we create a new node p' = (t(e),p') 
for the target i(e) of e (Line 6). Its prefix weight w' equals the current node's weight w v multiplied 
by the weight w(e) of e. The node set <ty for the new p' is created and inserted into the set of node 
sets $ (if it does not exist yet; Line 11). Then p' is inserted into <S> P > (if it is not yet a member of 
it; Line 12). If the prefix weight of p' is still undefined, = _L (because no prefix of p' has been 
analyzed yet), or if it is higher than the weight of the currently analyzed new prefix, w v ' > w' , then 
the variables w^, i/y , and e v > of p' are assigned values of the new prefix (Lines 13-14). 

The algorithm terminates by selecting the node p, corresponding to the path with the minimal 
weight, from the final node set <i>f lna i = <t> | s | - This weight is the product of the node's prefix weight w v 
and the final weight g(q) of the corresponding state qEQ (Line 15). The function getPath( ) identifies 
the best path 7 by following all back-pointers from the node p E fyinai to some node p E 0mitiai, 
and collecting all transitions e — e v it encounters. Finally, 7 is returned. 

3.3 e-Transitions 

The algorithm can be extended to allow for e-transitions (but not for e-cycles). The source and target 
node, p> and p', of an e-transition would be in the same <t> p . If p' = (q' \p') is actually inserted into 
<t> p (Line 12) or if its variables w^, Vy, and e v > change their values (Lines 13-14), then we have 
to (re-) "include" p' into the iteration over all nodes of the currently inspected <t> p (Line 6). The 
algorithm will still terminate since there can be only finite sequences of e-transitions (as long as we 
have no e-cycles). 

3.4 Best transduction 

The algorithm FsaViterbi( ) can be used for compiling the best transduction of a given input string 
s by a 2-WFSM (weighted transducer). For this, we identify the best path 7 accepting s on its input 
tape and take the label of 7's output tape as best output string v. 

4 n-Tape Best-Path Search 

We come now to the central topic of this paper: the generalization of the Viterbi algorithm for 
searching for the best of all paths of an n- WFSM, A {n \ that accept a given n-tuple of input strings, 
s («) — (si, . . . s„). This requires relatively few modifications to the above explained structures and 
algorithm (Section 3). 

4.1 Structures 

The main difference wrt. the previous structures is that now our reading pointer is a vector of n 
natural integers, = (pi, . . .p n ) G ([0, . . . |si|] x . . . x [0, . . . \s n \] ) C N™. The pointer is initially 
positioned before the first letter of each Sj (Vz € [!,«]), = (0, . ..0) . Its elements pi are then 

increased according to the non-synchronized reading of the s, on the tapes i (Vie [l,n]), until the 
pointer reaches its final position after the last letter of each Sj, p*-™-* = • • • \s n \) ■ 

More precisely, a pointer is an element of the monoid (N™,+,0) with + being vector addition 
and the vector of n 0's. We have a partial order of pointers. Let C : N"xN" ^ {true, false}. 
Let a, b E N™, then a C b ( 3c E N n ,c ^ : a + c = b) . We say a precedes b. It holds that 
a d b => ( Y^7=i a i < Sr=i ^») where a, and bi are the vector elements. 

In the trellis (Figure 3) we have still one node set <t> p (_n) per pointer position p( n \ a single initial 
node set initial = $(0, ...0) and a single final node set 0fi na i = ( t , (|si|,...|s„|)- There are, however, several 
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nodes sets in parallel between the two (corresponding to pointers p( n \p'^ not preceding each other, 
i.e,pW^p' (n) Ap' ( "Vp (n) )- 




Figure 3: Modified trellis for n-tape best-path search 



4.2 Algorithm 

The algorithm FsmViterbi( ) returns from all paths 7^ of the n-WFSM that accept the string 
tuple s( n \ the one with minimal weight (Figure 4). A^> must not contain any transitions labeled 
with (e, . . . e). 3 

The initial node set ^initial = ^(c.-.o) is created as before, and inserted into the set of node sets $ 
(Lines 1-4). In addition, it is inserted into a Fibonacci heap 4 H (Line 4) (Fredman and Tarjan, 1987). 
This heap contains node sets that have not yet been processed, and uses Pi as sorting key. 

The subsequent iteration continues as long as H is not empty (Lines 5-16). The function extract- 
MinElement( ) extracts the (or a) minimal element <t> p (n) from H (Line 6). Due to our sorting key, 

none of the remaining <t> p ,( n ) in H is a predecessor to p (n) : V<t> p ,<„) eH , p'^ \£p( n \ This property 
prevents the compilation of suffixes of a p < re ) that has some not yet analyzed prefixes (which could 
lead to wrong choices). The extracted p <„) is handled almost as in the previous algorithm (Figure 2). 
Transition labels £(e^) are required to match with a factor of at position p^ (Line 9). New 
p /(n) are inserted both into $ and H (Lines 12-13). 

4.3 Best transduction 

The algorithm FsmViterbi( ) can be used for obtaining from a weighted (n+m)-WFSM (transducer) 
with n input and m output tapes, the best transduction of a given input n-tuple s^ n K For this, we 
identify the best path ^( n+m ) accepting on its n input tapes and take the label of 7's m output 
tapes as best output m-tuple v( m K Input and output tapes can be in any order. 

3 The algorith m can be extended to allow for (e, . . . e)-transitions (but not for (e, . . . e)-cycles) as described in Section 3. 
4 Alternatively, one could use a binary heap. Tests on a concrete example have, however, shown that the algorithm 
performs slightly better with a Fibonacci heap (Table 1). 
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FSMVlTERBl(s (n) , A {n) ) -f 7 (n) : 



1 ^initial <- [ ^initial = C t><0,...0> ] 

2 for Vg G Q : A(g)^0 do 

3 9? «- (q, (0, . . . 0)) ; w v <- A(g) ; <t>r n itrai <- ^initial U {p} 

4 * «- {^initial} ; H <- {* initial } 

5 while H / do 

6 ^pfn) <— extractMinElement(H.) 

7 for V^={q,p (n) ) £* pW do 

8 for Ve (n) G do 

9 if 3u (n) ,« (n) G(S*) n : u (n) £(e (n) y n) =s (n) A p (n) = <K|, . . . M) 

10 then p' (n) «- pW + (|(*(e<**>))i|, . . . |(*(e<">))„|) 

11 <- <i(e (n) ),jj' (n) ) ; w' ^ w v ®w(e (n) ) 

12 if* p , ( „)£* 

13 then $ <- *U{<t> p/( „)} ; H^Hu{* p , ( „)} 

14 * p ,(n) «- <t> p , M U {^'} 

15 if w,pi =1 V w v i > w' 

16 then w v i <— w' ; ?/V *- V I e <?' *- e'™' 1 



17 £ <- argmin^ p („ )>g<t , fina| Kg(?)) [ <t>fi„ a i = *<| S1 |, ...| Sn |> ] 

18 7 (n) <- getPath(ip) 

19 return 7 (n) 



Figure 4: Pseudocode of n-tape best-path search 



4.4 Complexity 

The trellis (Figure 3) consists of at most |P| = Y[i=i(\ s i\ + 1) n °de sets <t> p (n) G 3>. Assuming 
approximately equal length \s\ for all Si of s^ n \ we can simplify: |P| « (|s| + 1)". For each node set 
p ( n ) we have to create at most \Q\ nodes ^£0 p w , which leads toaO(|s|"|(5|) space complexity for 
our algorithm. 

Each <t> p ( n ) is extracted once from the Fibonacci heap H in O(log P|) time. We analyze for <t> p (n) at 
most \E\ transitions eeE of A^ n \ For the target of each e we find a <t> G 4> in C(log |P|) time and 
a node j)'e0 p ,(») in 0(log|Q|) time. Thus, FsmViterbi( ) has a worst-case overall time complexity 
of O ( |P|(log \P\ + |P|(log |P| + log \Q\)) ) = 0{ \P\\E\ log |P||Q| ) = Q{ \s\ n \E\ log | S |"|Q| ) . 

An HMM has exactly one transition per state pair, so that \E\ — |Q| 2 , and an arity of n=\. There 
would also be never more than one ^> p (n) on the heap, extractable in constant time. In this case, our 
algorithm has a O (|s||Q|) space and a O (\s\\Q\ 2 ) time complexity, as has the classical version of the 
Viterbi algorithm (Section 2). 

5 Example: Word Alignment 

In this section we illustrate our n-tape best path search on a practical example: the alignment of word 
pairs. 

Suppose, we want to create a (non-weighted) transducer, D^ 2 \ from a list of word pairs of the 
form (inflected form, lemma), e.g., (swum, swim), such that each path of the transducer is labeled with 
one of the pairs. We want to use only transition labels of the form (a, a), (a, e), or (e, a) (Ver G S), 
while keeping paths as short as possible. For example, (swum, swim) should be encoded either by 
the sequence (s, s)(w, w)(u, e)(e, i)(m, m) or by (s, s)(w, w)(e, i)(u, e)(m, m), rather than by the ill-formed 
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(s,s)(w, w)(u, i)(m, m), or the sub-optimal (s, e) (w, e)(u, e) (m, e) (e, s)(e, w) (e, i) (e, m). To achieve this, 
we perform for each word pair an alignment based on minimal edit distance. 



5.1 Standard solution with edit distance matrix 

A well known standard solution for word alignment is based on edit distance which is a string similarity 
measure defined as the minimum cost needed to convert one string into another (Wagner and Fischer, 
1974; Pirkola et al., 2003). 

For two words, a = a\ . . . a n and b = b\ . . . b m , the edit distance can be compiled with a matrix 
X = {xi_j} (i E [0, n], j E [0, m]) (Figures 5 and 6). A horizontal move in X at a cost cj expresses an 
insertion, a vertical move at a cost cd a deletion, and a diagonal move at a cost cs a substitution if 
ai^bj or no edit operation if a,i = b r We set c/ = Cd = 1, c s = oo for a^bj (to disable substitutions), 
and cs = for a t = by The element x ,o is set to and all other Xij to min(xj j_i + cj , Xi-i j + 
cd ! x i~ij-i + c s), insofar as these choices are available, proceeding top-down and left-to-right. The 
choices made to go from x ,o to x n ^ m describe the set of paths with (the same) minimal cost. Each of 
these paths defines a sequence of edit operations for transforming a into b. 

The algorithm operates in 0(|a||6|) time and space complexity. 



target word: s W i m 
-2-3-4 
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-a 


w 


s 








u 

CJ 


u 


H 
3 




O 


m 




Figure 5: Edit distance matrix X = {xij} 
(choices are indicated by arrows; minimum 
cost paths by thick arrows and circles) 



1 £0,0 <— 

2 for i = 1 . . . \a\ do 

3 Xifi <— Xj-1,0 + CD 

4 for j = 1 . . . |6| do 

5 x j <— x ,j-i + cj 

6 for i = 1 . . . \a\ do 

7 for j = 1 . . . |6| do 

8 m D <— Xi-i t j + c D 

9 mi <— + cj 

10 m s <— Si-ij-i + c s 

11 Sij <— min( m,D, mi,ms) 



Figure 6: Pseudocode of compiling an 
edit distance matrix 



5.2 Solution with 2-tape best path search 

Alternatively, word alignment can be performed by best path search on an n-WFSM, such as A 1 - 5 ' 1 
generated from the expression (Isabelle and Kempe, 2004) 

^4 (5) = (((?,?, ?,?,K) {1=2=3=4} ,0) 

U «e,?, @,?,I) {2=4} ,1) U «?,e,?,@,D) {1=3} ,l))* (2) 

where ? can be instantiated by any symbol a E S, @ is a special symbol representing e in an alignment, 
{1 = 2 = 3 = 4} a constraint requiring the ?'s on tapes 1 to 4 to be instantiated by the same symbol 
(Nicart et al., 2006), 5 and and 1 are weights over the semiring (N U {oo}, min, +, oo, 0). 

Input word pairs — (si,s 2 ) will be matched on tape 1 and 2, and aligned output word pairs 
generated from tape 3 and 4. A symbol pair (?, ?) read on tape 1 and 2 is identically mapped to (?, ?} 
on tape 3 and 4, a (e, ?} is mapped to (@, ?), and a (?, e) to (?, @). will introduce @'s in si (resp. 

5 Roughly following (Kcmpc, Champarnaud, and Eisner, 2004), we employ here a simpler notation for constraints 
than in (Nicart et al., 2006). 
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in s 2 ) at positions where shall have (s,a)- (resp. a (a, e)-) transitions. (Later, we simply replace 
in L>< 2 ) all @ by e.) 

Thus, we obtain the full set of all possible alignments between si and s 2 - The best alignment is the 
one with the lowest weight. For example, (swum, swim) is mapped to a set of alignments, including the 
two best ones, (sw@um, swi@m) and (swu@m, sw@im), with weight 2 both. The (or a) best alignment 
can be found without generating all alignments, by means of our n-tape best path search (with n = 2). 

So far, we did not use tape 5. It can serve for excluding certain paths. For example, joining 
on tape 5 with (Kempe et al., 2005a; Kempe et al., 2005b) built from the expression 

->(?* I D ?*), prohibiting an insertion (i) to be immediately followed by a deletion (d), would leave 
only (swu@m, sw@im) as a best path. 

The 5-WFSM from Equation (2) has 1 state and 3 transitions. Input is read on 2 tapes. Our 
algorithm works on this example with a worst-case time complexity of 0{ \s\ \ \s 2 \ • 3 • log(|si \s 2 1 • 1) ) = 
0{ Si||s2 log |si||s2| ) and a worst-case space complexity of 0{ |si||s2| • 1 ) = 0{ | si 1 1 «2 1 ) • 

5.3 Test results 

We tested our n-tape best-path algorithm on the alignment of the German word pair (gemacht, machen) 
(English: (done, do)), leading to (gemacht@@, @@mach@en). We repeated this test for the word pairs 
(si,s 2 ) with si = "gemacht" and s 2 — "machen", and re [1,8]. 6 



r 


A 


B 


C 


D 


1 


1 


1 


1 


1.056 


2 


4 


4.12 


5.48 


1.041 


3 


9 


9.41 


14.3 


1.057 


4 


16 


17.1 


27.9 


1.029 


5 


25 


27.2 


46.5 


1.059 


6 


36 


39.8 


70.5 


1.016 


7 


49 


54.1 


100 


1.005 


8 


64 


70.8 


135 


1.006 



Table 1: Test results for word pair alignment with 2-tape best path search 

The columns of Table 1 show for different r : 

(A) an estimated time ratio of r 2 for the classical approach with an edit distance matrix, 

(B) the measured time ratio for 2-tape best path search (wrt. 3.93 milliseconds for r = 1) using a 
Fibonacci heap, 

(C) an estimated worst-case time ratio of ^ 7 6 g] log^) ^ = r2 (l+2 1 1 ° s 4 ' 2 ) corresponding to the worst- 
case complexity of C(7r6r log 7r6r) for the two words of length 7r and 6r, respectively, and 

(D) the measured time increase factor when using a binary instead of a Fibonacci heap. 

Comparing the columns A and B shows a time complexity slightly above O (r 2 ) = O ( | s \ \ \ s 2 1 ) , being 
much lower than the worst-case time complexity in column C, for our algorithm on this example. 

6 For example, for r = 2 we have (gemachtgemacht, machenmachen). 
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6 An Alternative Approach 



A well-known straight forward alternative to the above n-tape best-path search on an n-WFSM 

is to intersect A^ with an n-WFSM containing a single path labeled with the input n-tuple 

s( n \ and then to apply a classical shortest-distance algorithm, ignoring the labels. 

6.1 Intersection 

The intersection = /(") n A^ can be compiled as the join Xl { i=i,...„=„} (Kempe, 

Champarnaud, and Eisner, 2004). In general, it has undecidable emptiness and rationality (Rabin 
and Scott, 1959). In our case, however, with A^ being (e, . . . e)-cycle free and 1^ acyclic, it is even 
for non-commutative semirings always rational. 7 

Actually, the trellis in Figure 3 corresponds partially to B^ n \ Each node (p£<t> corresponds to a 
state q&QB of B^ (and vice versa); however, only those transitions eeEs of B^ that correspond 
to a state's best prefix, occur as "best transitions" e v in 0. 8 

From this analogy we deduce that compiling the intersection B^ has a worst-case time and space 
complexity of O ( -P||-E log -P||<2 ), with \P\ = (|s| + l) n , equal to the time complexity for constructing 
the trellis. The result, B^ n \ has at most v < \P\\Q\ states and p < \P\\E\ transitions. 

6.2 Shortest-distance algorithms 

Since any n-WFSM with multiple initial states can be transformed into one with a single initial state, 
we can use any algorithm that solves a single-source shortest-distance problem, such as Dijkstra's 
algorithm (Dijsktra, 1959) combined with Fibonacci heaps (Fredman and Tarjan, 1987), that oper- 
ates in 0(p + v\ogv) time, or Bellman- Ford's algorithm (Bellman, 1958; Ford and Fulkcrson, 1956) 
operating in O(fiv) time, with v being the number of states and p the number of transitions. 

Recently, it has been shown that any single-source shortest-distance algorithm on directed graphs 
has a lower bound of f2(/i + min(^logz^, z/logp)) where p is the ratio of the maximal to minimal 
transition weight (Pettie, 2003). Since we cannot make any assumption concerning p in general, 
we consider Q(p + vlogv) as a "worst-case lower bound". It equals the upper bound of Dijkstra's 
algorithm. 

On the intersection B^ = jf-'nAW, Dijkstra's algorithm requires 0(\P\\E\ + \P\ \Q\ log |P| \Q\) 
time, and Bellman-Ford's 0(|P| 2 \E\ \Q\) time, in the worst case. The sets E and Q refer to A (n \ 

6.3 Complete estimate 

Intersection and Dijkstra's algorithm have together a worst-case time complexity of 
0(\P\\E\log\P\\Q\ + \P\\E\ + \P\\Q\\og\P\\Q\)^0(\P\(\E\ + \Q\) log \P\\Q\ ). For intersection and 
Bellman-Ford's algorithm it is O (\P\\E\\og\P\\Q\ + \P\ 2 \E\\Q\) = O ( |P| \E\ (\P\ \Q\ +log \P\ \Q\) ). 
Both combinations exceed the complexity of our algorithm. 

This result is not surprising since only building the trellis <t> should take less time than building 
the intersection B^ (which is a kind of "superset" of 0) and then performing a best-path search. 

7 The intersection of two n-WFSM over non-commutative semirings is in general not rational (even for n = l). 

8 Due to this analogy, one can easily derive an n-tape intersection (or join) algorithm, for precisely our case, from the 
algorithm in Figure 4. Trellis nodes would become states of the resulting n-WFSM. All of their incoming transitions 
would be constructed, rather than only those that correspond to a best prefix. The state set would be partitioned 
like the trellis. The Fibonacci heap can be replaced by a stack (which does not decrease the overall time complexity), 
because the order in which partitions are treated would be irrelevant. 
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7 Conclusion 



We presented an algorithm for identifying the path with minimal (resp. maximal) weight in a given 
n-tape weighted finite-state machine (n-WFSM), A^ n \ that accepts a given n-tuple of input strings, 
s (n) — ^ Slj _ Sn y Tijjg p ro blcm is of particular interest because it allows us also to compile the best 
transduction of a given input n-tuple s'"' by a weig hted (n + m)-WFSM (transducer), A^ n+m \ with 
n input and m output tapes. For this, we identify the best path accepting on its n input tapes, 
and take the label of its output tapes as best output m-tuplc v^ m \ (Input and output tapes can be 
in any order.) 

Our algorithm is a generalization of the Viterbi algorithm which is generally used for detecting 
the most likely path in a Hidden Markov Model (HMM) for an observed sequence of symbols emitted 
by the HMM. In the worst case, it operates in O ( log |s| n |Q| ) time, where n and \s\ are the 
number and average length of the strings in s^ n \ and \Q\ and \E\ the number of states and transitions 
of A^ n \ respectively. 

We illustrated our n-tape best path search on a practical example, the alignment of word pairs 
(i.e., n = 2), and provided test results that show a time complexity slightly higher than O ( \s\ 2 ). 

Finally, we discussed a straight forward alternative approach for solving our problem, that consists 
in intersecting with an n-WFSM I^ n \ that has a single path labeled with the input n-tuple 
s^ n \ and then applying a classical shortest-distance algorithm, ignoring the labels. This has, how- 
ever, a worst-case time complexity of O ( |s|"(|i?| + \Q\) log |s|™|Q| ), which is higher than that of our 
algorithm. 
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